Last updated: 2025-05-28
Checks: 6 1
Knit directory: /mnt/central_nas/projects/type1_diabetes/nathan/BlockCourse/2024_BlockCourse/T1D_analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 980aa9d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: T1D_analysis/20250528_CD8a_INS_density_Immune.png
Ignored: T1D_analysis/20250528_CD8a_INS_density_Islet.png
Ignored: T1D_analysis/figure/
Ignored: processing/
Untracked files:
Untracked: T1D_analysis/05_CellCategories_cells_Compressed.html
Unstaged changes:
Modified: T1D_analysis/01_ImportData_cells_Compressed.html
Modified: T1D_analysis/02_SpilloverCompensation_cells_Compressed.html
Modified: T1D_analysis/03_TransformCorrect_cells_Compressed.html
Modified: T1D_analysis/04_QualityControl_cells_Compressed.html
Modified: T1D_analysis/05_CellCategories_cells_Compressed.Rmd
Modified: T1D_analysis/05_CellCategories_cells_Uncompressed.Rmd
Modified: T1D_analysis/06_CellTypes_cells_Compressed.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (T1D_analysis/06_CellTypes_cells_Compressed.Rmd) and HTML (T1D_analysis/06_CellTypes_cells_Compressed.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 955a62a | nathansteenbuck | 2025-05-28 | full update 2025 annotation |
| Rmd | 3c04956 | nathansteenbuck | 2024-11-21 | final script added |
| Rmd | e21a12b | nathansteenbuck | 2024-11-20 | edits scripts |
| Rmd | f9ec97c | nathansteenbuck | 2024-11-20 | adjust descriptions |
| Rmd | 43dff72 | nathansteenbuck | 2024-11-19 | update analysis_pipeline |
| Rmd | 08f607f | nathansteenbuck | 2024-11-19 | update channel decomposition + installation |
| Rmd | abf8b5b | nathansteenbuck | 2024-10-23 | backbone_3 |
Rscript -e “rmarkdown::render(‘2024_BlockCourse/T1D_analysis/06_CellTypes_cells_Compressed.Rmd’)”
In the following scripts, cell types are attributed to all cells in the dataset in an iterative way:
05_CellCategories_cells_Compressed: Attribution of main categories (immune, islet, other (exocrine+stroma)).06_CellTypes_cells_Compressed: Attribution of cell types (this script).In both scripts this is performed by performing PhenoGraph clustering using the Rphenoannoy package.
The resulting cell categories, which are used in downstream analyses are stored as colData(spe)$cell_category, and the cell types are stored as colData(spe)$cell_type.
suppressPackageStartupMessages(c(
library(data.table),
library(dplyr),
library(SpatialExperiment),
library(parallel),
library(tictoc),
library(purrr),
library(furrr)
))
# Paths
if (!dir.exists(paths$folder_script)) dir.create(paths$folder_script)
plotsave_param$path <- paths$folder_script
plotsave_param_large$path <- paths$folder_script
# Misc settings
today <- gsub("-", "", Sys.Date())
Load the SpatialExperiment (SPE) object saved at the previous step.
fn_spe <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, ".rds"))
spe <- readRDS(fn_spe)
print(spe)
class: SpatialExperiment
dim: 27 139629
metadata(2): spillover_matrix subset
assays(6): counts compcounts ... scaled fastMNN_case_id
rownames(27): H3 CD44_GCG ... DNA3 PPY
rowData names(10): channel metal ... shortname channel_name
colnames(139629): 6238_Compressed_001_1 6238_Compressed_001_2 ...
6396_Compressed_030_1445 6396_Compressed_030_1446
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id
suppressPackageStartupMessages(c(
library(ggplot2),
library(scater),
library(scuttle),
library(scran),
library(igraph),
library(Rphenoannoy),
library(clustree),
library(ranger),
library(patchwork),
library(BiocParallel),
library(tictoc)
))
Select channels (channels_clust), assays (assay_sel) and clustering methods (methods_sel) to use.
methods_sel <- c("Pheno1")
assay_sel <- c("scaled")
names(assay_sel) <- c("scaled")
writeLines(c("Assays:", assay_sel[assay_sel %in% assayNames(spe)],
assay_sel[assay_sel %in% reducedDimNames(spe)]))
Assays:
scaled
dimred_sel <- c("UMAP")
writeLines(c("\nReduced dimensions:", dimred_sel))
Reduced dimensions:
UMAP
channels <- rownames(spe)[!(grepl("DNA|H3", rownames(spe)))]
cat(c("\nChannels:", channels[channels %in% rownames(spe)]))
Channels: CD44_GCG GLUT1 CD99 CD68 MPO SMA CD20_SST AMY CD3e_NKX6_1 CK19 PDX1 SYP CD45RO FOXP3 CD45RA CD8a_INS CA9 IAPP CD4_ProINS CD31 Ecdh PTPRN PCSK2 PPY
cat(c("\nNumber of channels:", length(channels)))
Number of channels: 24
Reduced dimension plots showing marker expression that generated by the previous script can be used to select the most relevant markers for clustering.
channels_clust <- rownames(rowData(spe)[rowData(spe)$clustering == 1, ])
cat(c("\nChannels used for unsupervised clustering:",
channels_clust[channels_clust %in% rownames(spe)]))
Channels used for unsupervised clustering: CD44_GCG GLUT1 CD99 CD68 MPO SMA CD20_SST AMY CD3e_NKX6_1 CK19 PDX1 SYP CD45RO FOXP3 CD45RA CD8a_INS CA9 IAPP CD4_ProINS CD31 Ecdh PTPRN PCSK2 PPY
Now, we create new SCE objects for each cell category
sce_isl <- spe[, colData(spe)$cell_category == "Islet"]
sce_immune <- spe[, colData(spe)$cell_category == "Immune"]
sce_exo <- spe[, colData(spe)$cell_category == "Exocrine"]
sce_stroma <- spe[, colData(spe)$cell_category == "Stroma"]
sce_other <- spe[, colData(spe)$cell_category == "Other"]
We will now decompose the channels into their respective signals and noise components. We have the following assumptions:
Assuming S(CD20) and S(SST) represent the true signals and N(CD20) and N(SST) represent the noise components, we can now write the following equations:
channels_compressed <- c("CD44_GCG", "CD8a_INS", "CD4_ProINS", "CD3e_NKX6_1", "CD20_SST")
channels_uncompressed <- c("CD44", "GCG", "CD8a", "INS", "CD4", "ProINS", "CD3e", "NKX6_1", "CD20", "SST")
Fit 2-component GMMs with unequal variance to each channel in each cell category to estimate the means(“M_”) and variances (“V_”) of the signal (“S_”) and noise (“N_”) components.
In addition, extract the threshold value that separates the signal from the noise.
coefficients <- list()
coefficients <- map(channels_compressed, \(channel) {
# Immune:
S_Immune_N_Islet <- mclust::densityMclust(data = assay(sce_immune, "exprs")[channel, ], G = 2, modelNames = "V")
M_S_Immune_N_Islet <- S_Immune_N_Islet$parameters$mean[2]
V_S_Immune_N_Islet <- S_Immune_N_Islet$parameters$variance$sigmasq[2]
# Extract maximal value for the noise -> threshold value.
threshold_Immune <- max(S_Immune_N_Islet$data[S_Immune_N_Islet$classification == 1,])
# Islet:
N_Immune_S_Islet <- mclust::densityMclust(data = assay(sce_isl, "exprs")[channel, ], G = 2, modelNames = "V")
M_N_Immune_S_Islet <- N_Immune_S_Islet$parameters$mean[2]
V_N_Immune_S_Islet <- N_Immune_S_Islet$parameters$variance$sigmasq[2]
threshold_Islet <- max(N_Immune_S_Islet$data[N_Immune_S_Islet$classification == 1,])
# Exocrine:
N_Immune_N_Islet <- mclust::densityMclust(data = assay(sce_exo, "exprs")[channel, ], G = 2, modelNames = "V")
M_N_Immune_N_Islet <- N_Immune_N_Islet$parameters$mean[1]
V_N_Immune_N_Islet <- N_Immune_N_Islet$parameters$variance$sigmasq[1]
# Stroma:
I_Stroma <- mclust::densityMclust(data = assay(sce_stroma, "exprs")[channel, ], G = 2, modelNames = "V")$parameters$mean[1]
coefficients <- c(M_S_Immune_N_Islet, V_S_Immune_N_Islet,
M_N_Immune_S_Islet, V_N_Immune_S_Islet,
M_N_Immune_N_Islet, V_N_Immune_N_Islet,
threshold_Islet, threshold_Immune)
names(coefficients) <- c("M_S_Immune_N_Islet", "V_S_Immune_N_Islet",
"M_N_Immune_S_Islet", "V_N_Immune_S_Islet",
"M_N_Immune_N_Islet", "V_N_Immune_N_Islet",
"threshold_Islet", "threshold_Immune")
coefficients
}) |>
bind_rows() |>
mutate(channel = channels_compressed)
Inspect the coefficients.
coefficients
# A tibble: 5 × 9
M_S_Immune_N_Islet V_S_Immune_N_Islet M_N_Immune_S_Islet V_N_Immune_S_Islet
<dbl> <dbl> <dbl> <dbl>
1 0.362 0.126 1.98 0.778
2 0.539 0.186 1.36 0.566
3 0.370 0.0507 2.04 1.51
4 0.179 0.0101 0.336 0.0218
5 0.402 0.382 2.84 2.34
# ℹ 5 more variables: M_N_Immune_N_Islet <dbl>, V_N_Immune_N_Islet <dbl>,
# threshold_Islet <dbl>, threshold_Immune <dbl>, channel <chr>
-> INS signal and CD8a noise.
N_Immune_S_Islet <- mclust::densityMclust(data = assay(sce_isl, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")
p <- tibble(counts = assay(sce_isl, "exprs")["CD8a_INS", ]) |>
ggplot(aes(x = counts)) + geom_density() +
labs(x = "arcinsh(counts): CD8a_INS",
y = "Density") +
mytheme$standard_new() +
geom_vline(xintercept = N_Immune_S_Islet$parameters$mean[2], color = "blue") +
geom_vline(xintercept = max(N_Immune_S_Islet$data[N_Immune_S_Islet$classification == 1,]), color = "orange")
print(p)
fn <- paste0(today, "_CD8a_INS_density_Islet.png")
ggsave(fn, p, width = 10, height = 10)
Show CD8a_INS Distribution for Immune Cells: -> CD8a signal and INS noise.
N_Islet_N_Immune <- mclust::densityMclust(data = assay(sce_immune, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")
p <- tibble(counts = assay(sce_immune, "exprs")["CD8a_INS", ]) |>
ggplot(aes(x = counts)) + geom_density() +
labs(x = "arcinsh(counts): CD8a_INS",
y = "Density") +
mytheme$standard_new() +
geom_vline(xintercept = N_Islet_N_Immune$parameters$mean[2], color = "blue") +
geom_vline(xintercept = max(N_Islet_N_Immune$data[N_Islet_N_Immune$classification == 1,]), color = "orange")
print(p)
fn <- paste0(today, "_CD8a_INS_density_Immune.png")
ggsave(fn, p, width = 10, height = 10)
Show CD8a_INS Distribution for Exocrine Cells:
N_Islet_N_Immune <- mclust::densityMclust(data = assay(sce_exo, "exprs")["CD8a_INS", ], G = 2, modelNames = "V")
p <- tibble(counts = assay(sce_exo, "exprs")["CD8a_INS", ]) |>
ggplot(aes(x = counts)) + geom_density() +
labs(x = "arcinsh(counts): CD8a_INS",
y = "Density") +
mytheme$standard_new() +
geom_vline(xintercept = N_Islet_N_Immune$parameters$mean[1], color = "blue") +
geom_vline(xintercept = max(N_Islet_N_Immune$data[N_Islet_N_Immune$classification == 1,]), color = "orange")
print(p)
fn <- paste0(today, "_CD8a_INS_density_Immune.png")
ggsave(fn, p, width = 10, height = 10)
Now, we will decompose the channels into their respective signals and noise components. Here, we estimate the means and variances of the signal and noise components for each channel in each cell category.
estimated_coeffs <- purrr::pmap(coefficients |> dplyr::select(M_S_Immune_N_Islet,
V_S_Immune_N_Islet,
M_N_Immune_S_Islet,
V_N_Immune_S_Islet,
M_N_Immune_N_Islet,
V_N_Immune_N_Islet),
function(M_S_Immune_N_Islet, V_S_Immune_N_Islet, M_N_Immune_S_Islet, V_N_Immune_S_Islet,
M_N_Immune_N_Islet, V_N_Immune_N_Islet) {
# As additional constraints we introduce scaling factors between the signal + noise components.
# Scaling factor between Islet signal (e.g. SST) and Immune signal (e.g. CD20) (estimated from the data)
fraction_mu <- (M_N_Immune_S_Islet - M_N_Immune_N_Islet/2) / (M_S_Immune_N_Islet- M_N_Immune_N_Islet/2)
# Scaling factor between sigma SST signal and sigma CD20 signal (estimated from the data)
# -> This is a strong assumption and might not be true in reality.
fraction_sigma <- (V_S_Immune_N_Islet - V_N_Immune_N_Islet/2)/(V_N_Immune_S_Islet - V_N_Immune_N_Islet/2)
# -> we assume that the same fractions apply for the noise.
message("Fraction mu: ", fraction_mu, " Fraction sigma: ", fraction_sigma)
message("1: ", M_S_Immune_N_Islet,
"2: ", V_S_Immune_N_Islet,
"3: ", M_N_Immune_S_Islet,
"4: ", V_N_Immune_S_Islet,
"5: ", M_N_Immune_N_Islet,
"6: ", V_N_Immune_N_Islet)
# Augmented coefficient matrix
A <- rbind(
c(1, 0, 0, 1, 0, 0, 0, 0), # M_S_Immune_N_Islet = mu_S_CD20 + mu_N_SST
c(0, 1, 1, 0, 0, 0, 0, 0), # M_N_Immune_S_Islet = mu_N_CD20 + mu_S_SST
c(0, 0, 1, 1, 0, 0, 0, 0), # M_N_Islet_N_Immune = mu_N_SST + mu_N_CD20
c(0, 0, 0, 0, 1, 0, 0, 1), # V_S_Immune_N_Islet = sigma2_S_CD20 + sigma2_N_SST
c(0, 0, 0, 0, 0, 1, 1, 0), # V_N_Immune_S_Islet = sigma2_N_CD20 + sigma2_S_SST
c(0, 0, 0, 0, 0, 0, 1, 1), # V_N_Immune_N_Islet = sigma2_N_SST + sigma2_N_CD20
c(fraction_mu, -1, 0, 0, 0, 0, 0, 0), # fraction_mu * mu_S_CD20 - S_SST = 0
c(0,0,fraction_mu,-1, 0, 0, 0, 0) # fraction_mu * mu_N_CD20 - mu_N_SST = 0
)
message("A: ", A)
# Right-hand side (means and variances combined)
B <- c(
M_S_Immune_N_Islet, M_N_Immune_S_Islet, M_N_Immune_N_Islet,
V_S_Immune_N_Islet, V_N_Immune_S_Islet, V_N_Immune_N_Islet, 0,0
)
message("B: ", B)
# Solve by non-negative least squares. Counts should not be negative here.
fit_nnls <- nnls::nnls(A, B) # Solve the joint system
coefs <- coef(fit_nnls)
names(coefs) <- c("mu_S_Immune", "mu_S_Islet", "mu_N_Immune", "mu_N_Islet",
"sigma2_S_Immune", "sigma2_S_Islet", "sigma2_N_Immune", "sigma2_N_Islet")
coefs
}) |> dplyr::bind_rows() |> dplyr::mutate(channel = channels_compressed)
Fraction mu: 5.68701986522426 Fraction sigma: 0.161916046350452
1: 0.3620610033206792: 0.1262384154254513: 1.983029031856694: 0.7782794776979875: 0.03243813281657716: 0.000530923996168627
A: 1000005.687019865224260010000-1001100005.687019865224261010000-100010000000010000000110000010100
B: 0.3620610033206791.983029031856690.03243813281657710.1262384154254510.7782794776979870.00053092399616862700
Fraction mu: 2.62194622232602 Fraction sigma: 0.328719998370566
1: 0.538933147934292: 0.1861459996342543: 1.362740461793784: 0.5656324971052435: 0.06204061370993216: 0.000629502165911279
A: 1000002.621946222326020010000-1001100002.621946222326021010000-100010000000010000000110000010100
B: 0.538933147934291.362740461793780.06204061370993210.1861459996342540.5656324971052430.00062950216591127900
Fraction mu: 6.57375878571027 Fraction sigma: 0.0327536893663039
1: 0.369772425233842: 0.05065069143029433: 2.041540221581654: 1.508353148110645: 0.1396739695699246: 0.00257754606802299
A: 1000006.573758785710270010000-1001100006.573758785710271010000-100010000000010000000110000010100
B: 0.369772425233842.041540221581650.1396739695699240.05065069143029431.508353148110640.0025775460680229900
Fraction mu: 2.12299935886517 Fraction sigma: 0.451489332943297
1: 0.1785272981009582: 0.01005230764545023: 0.3355035028517154: 0.02182202433564265: 0.07748862225623516: 0.000728869816702751
A: 1000002.122999358865170010000-1001100002.122999358865171010000-100010000000010000000110000010100
B: 0.1785272981009580.3355035028517150.07748862225623510.01005230764545020.02182202433564260.00072886981670275100
Fraction mu: 7.2934056547457 Fraction sigma: 0.163211052076641
1: 0.4015616715471732: 0.3823908631692243: 2.83700691148874: 2.34233577226585: 0.02915599582664976: 0.000228916649075631
A: 1000007.29340565474570010000-1001100007.29340565474571010000-100010000000010000000110000010100
B: 0.4015616715471732.83700691148870.02915599582664970.3823908631692242.34233577226580.00022891664907563100
Inspect the estimated coefficients. Please note: This doesnt work perfect yet, and is the first attempt at algorithmic decomposition.
estimated_coeffs
# A tibble: 5 × 9
mu_S_Immune mu_S_Islet mu_N_Immune mu_N_Islet sigma2_S_Immune sigma2_S_Islet
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.348 1.98 0.00412 0.0221 0.126 0.778
2 0.510 1.34 0.0164 0.0391 0.186 0.566
3 0.307 2.02 0.0154 0.0962 0.0481 1.51
4 0.142 0.306 0.0246 0.0473 0.0101 0.0211
5 0.388 2.83 0.00290 0.0202 0.382 2.34
# ℹ 3 more variables: sigma2_N_Immune <dbl>, sigma2_N_Islet <dbl>,
# channel <chr>
joint_coeffs <- estimated_coeffs |>
dplyr::left_join(coefficients, by = "channel") |>
dplyr::select(-dplyr::starts_with("M_"), -dplyr::starts_with("V_"))
full_coeffs <- joint_coeffs |>
mutate(sigma2 = sigma2_N_Immune + sigma2_N_Islet) |>
mutate(sigma2_N_Immune = sigma2/2) |>
mutate(sigma2_N_Islet = sigma2/2) |>
select(-sigma2)
full_coeffs
# A tibble: 5 × 11
mu_S_Immune mu_S_Islet mu_N_Immune mu_N_Islet sigma2_S_Immune sigma2_S_Islet
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.348 1.98 0.00412 0.0221 0.126 0.778
2 0.510 1.34 0.0164 0.0391 0.186 0.566
3 0.307 2.02 0.0154 0.0962 0.0481 1.51
4 0.142 0.306 0.0246 0.0473 0.0101 0.0211
5 0.388 2.83 0.00290 0.0202 0.382 2.34
# ℹ 5 more variables: sigma2_N_Immune <dbl>, sigma2_N_Islet <dbl>,
# channel <chr>, threshold_Islet <dbl>, threshold_Immune <dbl>
After estimating the coefficients, we can now estimate the actual counts for each cell.
These are helper functions to estimate: - The most likely S_Islet and N_Immune for a given positive Islet cell. - The most likely N_Islet and S_Immune for a given positive Immune cell. - The most likely N_Islet and N_Immune in case of noise only.
# Function to calculate the most likely S_Islet and N_Immune
likelihood_s_islet_n_immune <- function(N_Immune, Immune_Islet_obs, mu_Islet, sigma_Islet, mu_N_Immune, sigma_N_Immune) {
S_Islet <- Immune_Islet_obs - N_Immune # Compute N_SST from the constraint
if (N_Immune < 0) return(0) # Noise cannot be negative
P_S_Islet <- dnorm(S_Islet, mean = mu_Islet, sd = sigma_Islet)
P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
return(P_S_Islet * P_N_Immune)
}
# Function to calculate the most likely N_Islet and S_Immune
likelihood_n_islet_s_immune <- function(N_Islet, Immune_Islet_obs, mu_Immune, sigma_Immune, mu_N_Islet, sigma_N_Islet) {
S_Immune <- Immune_Islet_obs - N_Islet # Compute N_SST from the constraint
if (N_Islet < 0) return(0) # Noise cannot be negative
P_S_Immune <- dnorm(S_Immune, mean = mu_Immune, sd = sigma_Immune)
P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
return(P_S_Immune * P_N_Islet)
}
# Function to calculate the most likely N_Islet and N_Immune in case of noise only.
likelihood_n_islet_n_immune <- function(N_Immune, Immune_Islet_obs, mu_N_Immune, sigma_N_Immune, mu_N_Islet, sigma_N_Islet) {
N_Islet <- Immune_Islet_obs - N_Immune # Compute N_SST from the constraint
if (N_Immune < 0) return(0) # Noise cannot be negative
P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
return(P_N_Immune * P_N_Islet)
}
# Function to calculate the most likely N_Islet and N_Immune in case of noise only.
likelihood_n_islet_n_immune_2 <- function(N_Islet, Immune_Islet_obs, mu_N_Immune, sigma_N_Immune, mu_N_Islet, sigma_N_Islet) {
N_Immune <- Immune_Islet_obs - N_Islet # Compute N_SST from the constraint
if (N_Islet < 0) return(0) # Noise cannot be negative
P_N_Immune <- dnorm(N_Immune, mean = mu_N_Immune, sd = sigma_N_Immune)
P_N_Islet <- dnorm(N_Islet, mean = mu_N_Islet, sd = sigma_N_Islet)
return(P_N_Immune * P_N_Islet)
}
# IF cell_category is Islet and channel is CD20_SST:
# Above threshold: signal intensity is SST signal + CD20 noise
# Below threshold: signal intensity is SST noise + CD20 noise
library(furrr)
set.seed(222)
options <- furrr::furrr_options(seed = seed)
# Islet cells.
results <-
purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
message(channel)
furrr::future_map(1:length(colnames(sce_isl)), \(i) {
# Get compressed value per cell.
Immune_Islet_obs <- assay(sce_isl, "exprs")[channel, ][i]
# Define optimization bounds. between [0, threshold_Immune-Noise]
lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
interval <- c(lower_bound1, upper_bound1)
# Catch edge-case.
if(Immune_Islet_obs == 0) {
tibble(Islet = 0, Immune = 0)
}
# Above threshold: signal intensity is SST signal + CD20 noise
else if(Immune_Islet_obs > threshold_Islet){
result <- optimize(
f = likelihood_s_islet_n_immune,
interval = interval,
maximum = TRUE,
Immune_Islet_obs = Immune_Islet_obs,
mu_Islet = mu_S_Islet,
sigma_Islet = sigma2_S_Islet,
mu_N_Immune = mu_N_Immune,
sigma_N_Immune = sigma2_N_Immune
)
# Extract the results
N_Immune_opt <- result$maximum
S_Islet_opt <- Immune_Islet_obs - N_Immune_opt
tibble(Immune = N_Immune_opt, Islet = S_Islet_opt)
}
# Below threshold: signal intensity is SST noise + CD20 noise
else{
result <- optimize(
f = likelihood_n_islet_n_immune,
interval = interval, # N_Immune cannot exceed the N_Immune + Sigma_Immune
maximum = TRUE,
Immune_Islet_obs = Immune_Islet_obs,
mu_N_Immune = mu_N_Immune,
sigma_N_Immune = sigma2_N_Immune,
mu_N_Islet = mu_N_Islet,
sigma_N_Islet = sigma2_N_Islet
)
# Extract the results
N_Immune_opt <- result$maximum
N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
tibble(Immune = N_Immune_opt, Islet = N_Islet_opt)
}
}, .options = options) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_optim <- results |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Islet` -> `Islet...9`
• `Immune` -> `Immune...10`
colnames(results_optim) <- channels_uncompressed
results_optim
# A tibble: 20,999 × 10
CD44 GCG CD8a INS CD4 ProINS CD3e NKX6_1 CD20 SST
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00431 0.0309 0.0167 0.0559 0.0142 0.0580 0.0249 0.0295 0 0
2 0.00393 0.0163 0.0167 -0.00432 0.0166 0.0176 0.0249 -0.00364 0 0
3 0.00410 2.41 0.0164 0.171 0.0154 0.424 0.0249 0.00666 0.0407 0.00297
4 0.00410 2.84 0.0164 0.249 0.0154 0.514 0.0249 0.0520 0.0861 0.00297
5 0.00410 2.71 0.0164 0.485 0.0154 1.73 0.0249 0.120 0.0890 0.00297
6 0.00414 1.19 0.0167 0.0515 0.0166 0.228 0.0243 0.0354 0.0440 0.00297
7 0.00410 2.14 0.0164 2.25 0.0154 1.27 0.0249 0.179 0.119 0.00297
8 0.00431 0.235 0.0164 0.845 0.0154 2.15 0.0247 0.690 0.0785 0.00297
9 0.00414 1.82 0.0167 0.0909 0.0166 0.276 0.0249 0.00599 0.0215 0.00297
10 0.00431 0.273 0.0164 2.38 0.0154 2.40 0.0246 0.294 0.116 0.00297
# ℹ 20,989 more rows
Check the results.
channel_oi <- "CD8a_INS"
# Split channels_oi into Immune and Islet
channel_oi2 <- channel_oi |> stringr::str_split("_") |> unlist()
channel_Immune <- channel_oi2[1]
channel_Islet <- ifelse(channel_oi == "CD8a_INS", "INS", channel_oi2[2])
## Get parameters for the channel of interest
mu_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Islet")
mu_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Immune")
mu_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Islet")
mu_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Immune")
threshold_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Immune")
threshold_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Islet")
sigma2_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Immune")
sigma2_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Islet")
sigma2_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Immune")
sigma2_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Islet")
# Compressed channel:
ggplot(tibble(x = assay(sce_isl, "exprs")[channel_oi, ]), aes(x=x)) + geom_density() +
geom_vline(xintercept = mu_S_Islet + mu_N_Immune, color = "yellow") + # Islet signal + Immune noise
geom_vline(xintercept = mu_S_Islet, color = "red") + # Islet signal (INS_S)
geom_vline(xintercept = threshold_Islet, color = "green") + # threshold.
geom_vline(xintercept = mu_N_Islet, color = "blue") + # mean Islet noise (INS_N)
ggtitle(channel_oi) # Density plot.
# Signal Islet Channel
ggplot(results_optim, aes(x = get(channel_Islet))) + geom_density() +
geom_vline(xintercept = mu_S_Islet, color = "red") + # INS signal.
geom_vline(xintercept = threshold_Islet, color = "green") + # threshold.
geom_vline(xintercept = mu_N_Islet, color = "blue") + # mean Islet noise (INS_N).
ggtitle("Islet")
# Noise Immune Channel -> this doesnt work great.
ggplot(results_optim, aes(x = get(channel_Immune))) + geom_density() +
geom_vline(xintercept = mu_N_Immune, color = "blue") +
ggtitle("Immune")
Assign the counts.
df <- as.data.frame(t(assay(sce_isl, "exprs")))
# Now add the decomposed channels.
# We do not yet remove the compressed channels.
df_decomposed <- df |>
mutate(CD44 = results_optim$CD44) |>
mutate(CD8a = results_optim$CD8a) |>
mutate(CD4 = results_optim$CD4) |>
mutate(CD3e = results_optim$CD3e) |>
mutate(CD20 = results_optim$CD20) |>
mutate(GCG = results_optim$GCG) |>
mutate(INS = results_optim$INS) |>
mutate(SST = results_optim$SST) |>
mutate(NKX6_1 = results_optim$NKX6_1) |>
mutate(ProINS = results_optim$ProINS)
# Uncomrpessed panel now as rowData.
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)
# Create a new SCE object.
sce_isl_2 <- SpatialExperiment(
assays = list(exprs = as.matrix(t(df_decomposed))),
colData = colData(sce_isl),
spatialCoords = spatialCoords(sce_isl),
metadata = metadata(sce_isl))
Perform the same operation for the Immune cells.
set.seed(222)
options <- furrr::furrr_options(seed = seed)
# Immune:
results_immun <- purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
message(channel)
results_immune <- furrr::future_map(1:length(colnames(sce_immune)), \(i) {
# Get compressed value per cell.
Immune_Islet_obs <- assay(sce_immune, "exprs")[channel, ][i]
# Catch edge-case.
if(Immune_Islet_obs == 0) {
tibble(Immune = 0, Islet = 0)
}
# Above threshold: signal intensity is CD20 signal + SST noise
else if(Immune_Islet_obs > threshold_Immune){
# Define optimization bounds.
# Noise cannot be lower than 0, and within the signal threshold.
lower_bound1 <- ifelse(mu_N_Islet - sigma2_N_Islet < 0, 0, mu_N_Islet - sigma2_N_Islet)
upper_bound1 <- ifelse(mu_N_Islet + sigma2_N_Islet > threshold_Islet, threshold_Islet, mu_N_Islet + sigma2_N_Islet)
interval <- c(lower_bound1, upper_bound1)
result <- optimize(
f = likelihood_n_islet_s_immune,
interval = interval,
maximum = TRUE,
Immune_Islet_obs = Immune_Islet_obs,
mu_Immune = mu_S_Immune,
sigma_Immune = sigma2_S_Immune,
mu_N_Islet = mu_N_Islet,
sigma_N_Islet = sigma2_N_Islet
)
# Extract the results
N_Islet_opt <- result$maximum
S_Immune_opt <- Immune_Islet_obs - N_Islet_opt
tibble(Immune = S_Immune_opt, Islet = N_Islet_opt)
}
# Below threshold: signal intensity is SST noise + CD20 noise
else{
# Define optimization bounds. Optimize on Immune noise.
lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
interval <- c(lower_bound1, upper_bound1)
result <- optimize(
f = likelihood_n_islet_n_immune,
interval = interval, # N_Immune cannot exceed the N_Immune + Sigma_Immune
maximum = TRUE,
Immune_Islet_obs = Immune_Islet_obs,
mu_N_Immune = mu_N_Immune,
sigma_N_Immune = sigma2_N_Immune,
mu_N_Islet = mu_N_Islet,
sigma_N_Islet = sigma2_N_Islet
)
# Extract the results
N_Immune_opt <- result$maximum
N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
tibble(Immune = N_Immune_opt, Islet = N_Islet_opt)
}
}) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_immune_optim <- results_immun |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Immune` -> `Immune...9`
• `Islet` -> `Islet...10`
colnames(results_immune_optim) <- channels_uncompressed
results_immune_optim
# A tibble: 12,653 × 10
CD44 GCG CD8a INS CD4 ProINS CD3e NKX6_1 CD20 SST
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00431 0.115 0.0167 -0.00851 0.0142 0.0503 0.0249 -0.0115 0.00284 0.0161
2 0.00431 0.00461 0.0167 0.0252 0.0166 0.0338 0.0243 0.0464 0.00297 0.00686
3 0.339 0.0221 0.0167 0.0114 0.0142 0.0914 0.0249 0.00609 0.00297 0.00908
4 0.295 0.0221 0.0167 0.0565 0.0142 0.0940 0.0249 0.00740 0.00288 0.0202
5 0.00431 0.149 0.0167 0.0260 0.0166 0.115 0.0249 0.0248 0.00297 0.00358
6 0.00393 0.0156 0.0167 0.00962 0.0142 0.0599 0.0249 0.0183 0.00297 0.0114
7 0.00431 0.00996 0.0162 0.0290 0.0142 0.0880 0.0243 0.0396 0.00297 0.0156
8 0.00431 0.0370 0.0167 0.0185 0.0142 0.0867 0.0243 0.0359 0.00297 0.0393
9 0.00431 0.0402 0.0167 0.0539 0.0142 0.0741 0.0249 0.0113 0.00297 0.0150
10 0.00431 0.0972 0.0167 0.0239 0.0142 0.0887 0.0243 0.0345 0.00297 0.0103
# ℹ 12,643 more rows
Check the results. Note: given that the Immune Channel usually has less signal, the decomposition is more challenging. -> Unfortunately, especially CD20 and CD3e dont have great signal.
channel_oi <- "CD8a_INS"
# Split channels_oi into Immune and Islet
channel_oi2 <- channel_oi |> stringr::str_split("_") |> unlist()
channel_Immune <- channel_oi2[1]
channel_Islet <- ifelse(channel_oi == "CD3e_NKX6_1", "NKX6_1", channel_oi2[2])
## Get parameters for the channel of interest
mu_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Islet")
mu_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_S_Immune")
mu_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Islet")
mu_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("mu_N_Immune")
threshold_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Immune")
threshold_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("threshold_Islet")
sigma2_N_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Immune")
sigma2_N_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_N_Islet")
sigma2_S_Immune <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Immune")
sigma2_S_Islet <- full_coeffs |> dplyr::filter(channel == channel_oi) |> pull("sigma2_S_Islet")
# Compressed channel:
ggplot(tibble(x = assay(sce_immune, "exprs")[channel_oi, ]), aes(x=x)) + geom_density() +
geom_vline(xintercept = mu_S_Immune, color = "red") +
geom_vline(xintercept = threshold_Immune, color = "green") +
geom_vline(xintercept = mu_N_Immune, color = "blue")
# Noise Islet Channel
ggplot(results_immune_optim, aes(x = get(channel_Islet))) + geom_density() +
geom_vline(xintercept = mu_N_Islet, color = "blue") +
ggtitle("Islet")
# Signal Immune Channel
ggplot(results_immune_optim, aes(x = get(channel_Immune))) + geom_density() +
geom_vline(xintercept = mu_S_Immune, color = "red") +
geom_vline(xintercept = threshold_Immune, color = "green") +
geom_vline(xintercept = mu_N_Immune, color = "blue") +
ggtitle("Immune")
Assign the counts.
df <- as.data.frame(t(assay(sce_immune, "exprs")))
# Now add the decomposed channels.
# We do not yet remove the compressed channels.
df_decomposed <- df |>
mutate(CD44 = results_immune_optim$CD44) |>
mutate(CD8a = results_immune_optim$CD8a) |>
mutate(CD4 = results_immune_optim$CD4) |>
mutate(CD3e = results_immune_optim$CD3e) |>
mutate(CD20 = results_immune_optim$CD20) |>
mutate(GCG = results_immune_optim$GCG) |>
mutate(INS = results_immune_optim$INS) |>
mutate(SST = results_immune_optim$SST) |>
mutate(NKX6_1 = results_immune_optim$NKX6_1) |>
mutate(ProINS = results_immune_optim$ProINS)
# Uncomrpessed panel now as rowData.
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)
# Create a new SCE object.
sce_immune_2 <- SpatialExperiment(
assays = list(exprs = as.matrix(t(df_decomposed))),
colData = colData(sce_immune),
spatialCoords = spatialCoords(sce_immune),
metadata = metadata(sce_immune))
sce_immune_2
class: SpatialExperiment
dim: 37 12653
metadata(2): spillover_matrix subset
assays(1): exprs
rownames(37): H3 CD44_GCG ... NKX6_1 ProINS
rowData names(0):
colnames(12653): 6238_Compressed_001_12 6238_Compressed_001_17 ...
6396_Compressed_030_1440 6396_Compressed_030_1444
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(0):
sce_stroma$sample_id <- paste0(sce_stroma$sample_id, "_stroma")
sce_exo$sample_id <- paste0(sce_exo$sample_id, "_exo")
# sce_other$sample_id <- paste0(sce_other$sample_id, "_other")
sce_rest <- cbind(sce_stroma, sce_exo, sce_other)
test_df <- t(assay(sce_rest, "exprs")) |>
tibble::as_tibble(rownames = "cell_id") |>
select(all_of(channels_compressed), cell_id) |>
tidyr::pivot_longer(cols = channels_compressed, names_to = "channel", values_to = "value")
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(channels_compressed)
# Now:
data %>% select(all_of(channels_compressed))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
test2_df <- full_coeffs |>
mutate(Immune_Islet_N_frac = mu_N_Immune/mu_N_Islet) |>
mutate(Islet_Immune_N_frac = mu_N_Islet/mu_N_Immune) |>
select(channel, Immune_Islet_N_frac, Islet_Immune_N_frac)
calc_df <- test_df |>
left_join(test2_df, by = "channel") |>
# change CD3_NKX6_1 to CD3_NKX6.1
mutate(channel = ifelse(channel == "CD3_NKX6_1", "CD3_NKX6.1", channel)) |>
tidyr::separate(channel, into = c("Immune_ch", "Islet_ch"), sep = "_") |>
mutate(Immune_N = Immune_Islet_N_frac * value) |>
mutate(Islet_N = Islet_Immune_N_frac * value)
Warning: Expected 2 pieces. Additional pieces discarded in 105977 rows [4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64,
69, 74, 79, 84, 89, 94, 99, ...].
calc_df_2 <- calc_df |>
select(-c(Immune_Islet_N_frac, Islet_Immune_N_frac, value)) |>
tidyr::pivot_wider(id_cols = cell_id, names_from = c(Immune_ch, Islet_ch), values_from = c(Immune_N, Islet_N))
results_rest_optim <- calc_df_2 |>
# rename from Islet_N_CD20_SST to SST
# rename from Immune_N_CD20_SSt to CD20
# rename from Islet_N_CD3_NKX6.1 to NKX6.1
# rename from Immune_N_CD3_NKX6.1 to CD3
dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD44_GCG", "CD44")) |>
dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD8a_INS", "CD8a")) |>
dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD4_ProINS", "CD4")) |>
dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD3_NKX6.1", "CD3")) |>
dplyr::rename_with(~stringr::str_replace(., "Immune_N_CD20_SST", "CD20")) |>
dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD20_SST", "SST")) |>
dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD3_NKX6.1", "NKX6.1")) |>
dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD8a_INS", "INS")) |>
dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD4_ProINS", "ProINS")) |>
dplyr::rename_with(~stringr::str_replace(., "Islet_N_CD44_GCG", "GCG"))
set.seed(222)
results_rest <-
purrr::pmap(full_coeffs, \(mu_S_Immune, mu_S_Islet, mu_N_Immune, mu_N_Islet, sigma2_S_Immune, sigma2_S_Islet, sigma2_N_Immune, sigma2_N_Islet, channel, threshold_Islet, threshold_Immune) {
message(channel)
# Optimization Interval.
lower_bound1 <- ifelse(mu_N_Immune - sigma2_N_Immune < 0, 0, mu_N_Immune - sigma2_N_Immune)
upper_bound1 <- ifelse(mu_N_Immune + sigma2_N_Immune > threshold_Immune, threshold_Immune, mu_N_Immune + sigma2_N_Immune)
interval <- c(lower_bound1, upper_bound1)
# Iterate over cells.
furrr::future_map(1:length(colnames(sce_rest)), \(i) {
Immune_Islet_obs <- assay(sce_rest, "exprs")[channel, ][i]
result <- optimize(
f = likelihood_n_islet_n_immune,
interval = interval, # N_Immune cannot exceed the N_Immune + Sigma_Immune
maximum = TRUE,
Immune_Islet_obs = Immune_Islet_obs,
mu_N_Immune = mu_N_Immune,
sigma_N_Immune = sigma2_N_Immune,
mu_N_Islet = mu_N_Islet,
sigma_N_Islet = sigma2_N_Islet
)
# Extract the results
N_Immune_opt <- result$maximum
N_Islet_opt <- Immune_Islet_obs - N_Immune_opt
tibble(Immune = N_Immune_opt, Islet = N_Islet_opt)
}) |> bind_rows()
})
CD44_GCG
CD8a_INS
CD4_ProINS
CD3e_NKX6_1
CD20_SST
results_rest_optim <- results_rest |> bind_cols()
New names:
• `Immune` -> `Immune...1`
• `Islet` -> `Islet...2`
• `Immune` -> `Immune...3`
• `Islet` -> `Islet...4`
• `Immune` -> `Immune...5`
• `Islet` -> `Islet...6`
• `Immune` -> `Immune...7`
• `Islet` -> `Islet...8`
• `Immune` -> `Immune...9`
• `Islet` -> `Islet...10`
colnames(results_rest_optim) <- channels_uncompressed
results_rest_optim
# A tibble: 105,977 × 10
CD44 GCG CD8a INS CD4 ProINS CD3e NKX6_1 CD20 SST
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00431 0.0516 0.0167 0.00229 0.0166 0.123 0.0249 -6.10e-3 0.00297 0.00615
2 0.00431 0.0590 0.0167 0.00816 0.0166 0.136 0.0243 4.18e-2 0.00297 0.00657
3 0.00431 0.0272 0.0167 0.0251 0.0142 0.0644 0.0249 -8.95e-3 0.00297 0.00586
4 0.00431 0.0381 0.0167 0.00550 0.0142 0.0692 0.0243 3.48e-2 0.00297 0.00651
5 0.00431 0.00502 0.0162 0.0321 0.0142 0.0567 0.0249 6.37e-3 0.00297 0.00686
6 0.00431 0.0640 0.0167 0.0186 0.0166 0.0150 0.0249 1.89e-2 0.00297 0.00724
7 0.00431 0.00845 0.0162 0.0300 0.0166 0.0435 0.0249 5.18e-2 0.00297 -0.00297
8 0.00431 0.146 0.0167 0.00549 0.0166 0.0128 0.0249 1.94e-2 0.00297 0.00206
9 0.00393 0.0180 0.0162 0.0338 0.0166 0.130 0.0249 2.18e-2 0.00297 0.0226
10 0.00431 0.0324 0.0167 0.0111 0.0166 0.0422 0.0249 1.96e-4 0.00297 0.0258
# ℹ 105,967 more rows
df <- as.data.frame(t(assay(sce_rest, "exprs")))
# Now add the decomposed channels.
df_decomposed <- df |>
mutate(CD44 = results_rest_optim$CD44) |>
mutate(CD8a = results_rest_optim$CD8a) |>
mutate(CD4 = results_rest_optim$CD4) |>
mutate(CD3e = results_rest_optim$CD3e) |>
mutate(CD20 = results_rest_optim$CD20) |>
mutate(GCG = results_rest_optim$GCG) |>
mutate(INS = results_rest_optim$INS) |>
mutate(SST = results_rest_optim$SST) |>
mutate(NKX6_1 = results_rest_optim$NKX6_1) |>
mutate(ProINS = results_rest_optim$ProINS)
# Uncomrpessed panel now as rowData.
fn_panel <- file.path(paths$folder_in, paste0("panel_", "Uncompressed", ".csv"))
panel <- read.csv(fn_panel, row.names = 1)
# Create a new SCE object.
sce_rest_2 <- SpatialExperiment(
assays = list(exprs = as.matrix(t(df_decomposed))),
colData = colData(sce_rest),
spatialCoords = spatialCoords(sce_rest),
metadata = metadata(sce_rest))
sce_rest_2
class: SpatialExperiment
dim: 37 105977
metadata(6): spillover_matrix subset ... spillover_matrix subset
assays(1): exprs
rownames(37): H3 CD44_GCG ... NKX6_1 ProINS
rowData names(0):
colnames(105977): 6238_Compressed_001_5 6238_Compressed_001_19 ...
6396_Compressed_030_1443 6396_Compressed_030_1445
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(0):
sce <- cbind(sce_isl_2, sce_immune_2, sce_rest_2)
'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.
fn <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, "sce.rds"))
saveRDS(sce, fn)
# sce <- readRDS(fn); sce_rest_2 <- sce[, sce$cell_category %in% c("Exocrine", "Stroma", "Other")];
# sce_immune_2 <- sce[, sce$cell_category == "Immune"]; sce_isl_2 <- sce[, sce$cell_category == "Islet"]
We now perform cell type annotation in 3 steps. First, for the Islet cells using the decompressed data.
RUN UMAP.
sce_sub <- sce[, metadata(sce)[["subset"]]]
cur_assay <- "exprs"
dimred_name <- paste("UMAP", cur_assay, sep = "_")
print(dimred_name)
[1] "UMAP_exprs"
# Run UMAP on a cell subset
# Extract Counts.
if (cur_assay %in% assayNames(sce_sub)) {
counts <- t(assay(sce_sub, cur_assay))
}
# Run UMAP.
umap_model <- uwot::umap(counts, ret_model = TRUE)
# Extract Embedding.
cur_umap <- umap_model$embedding
colnames(cur_umap) <- c("UMAP1", "UMAP2")
rownames(cur_umap) <- rownames(counts)
reducedDim(sce_sub, dimred_name) <- cur_umap
cur_dimred <- "UMAP_exprs"
cur_dat <- scuttle::makePerCellDF(sce_sub, features = rownames(sce_sub),
exprs_values = cur_assay) |>
dplyr::select(c(all_of(channels_uncompressed), paste0(dimred_name, ".1"),
paste0(dimred_name, ".2")))
cur_dat <- cur_dat |>
tidyr::pivot_longer(cols = all_of(channels_uncompressed),
names_to = "channel",
values_to = cur_assay)
# Plot marker expression
p <- plot_dim_red_channels(cur_dat, dimred_name, cur_assay,
channels_uncompressed, force_points = TRUE)
print(p)
fn <- paste0(paste(today, "Categories_Decomposed", cur_dimred,
sep = "_"), ".png")
do.call(ggsave, c(list(fn, p), plotsave_param))
channels_islet <- c("GCG", "PCSK2", # Alpha-Cells:
"INS", "SST", "NKX6_1", "ProINS", "IAPP", # Beta-cells:
"SST", "PDX1", # Delta-cells. PDX1 also beta-cells
"PPY", # Gamma cells.
"SYP", "CD99", "Ecdh", "CD31" # General markers and included just in case.
)
sce_isl_3 <- sce[channels_islet, sce$cell_category == "Islet"]
sce_isl_3
class: SpatialExperiment
dim: 14 20999
metadata(10): spillover_matrix subset ... spillover_matrix subset
assays(1): exprs
rownames(14): GCG PCSK2 ... Ecdh CD31
rowData names(0):
colnames(20999): 6238_Compressed_001_28 6238_Compressed_001_293 ...
6396_Compressed_030_1410 6396_Compressed_030_1439
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id
clust_method <- c("Pheno1")
cur_assay <- "exprs"
# Number of nearest-neighbors
k <- 30
##
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))
Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_isl_3))) {
set.seed(seed)
cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_isl_3, cur_assay)), k = k)
cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
colnames(cur_pheno) <- clust_name
rownames(cur_pheno) <- colnames(assay(sce_isl_3, cur_assay))
# Add Phenograph clusters to the colData of the SCE object
# Cluster `0` is attributed to non-subsetted cells and not islet cells
colData(sce_isl_3)[, clust_name] <- cur_pheno
remove(cur_pheno)
}
Run Rphenograph starts:
-Input data of 20999 rows and 14 columns
-k is set to 30
Finding nearest neighbors...DONE ~ 4.557 s
Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 0.947 s
Start jaccard
DONE ~ 0.025 s
Build undirected graph from the weighted links...DONE ~ 0.189 s
Run louvain clustering on the graph ...DONE ~ 1.922 s
Run Rphenograph DONE, totally takes 6.69300000000021s.
Return a community class
-Modularity value: 0.8225092
-Number of clusters: 20
Quickly check cluster abundances.
clust_freq <- table(colData(sce_isl_3)[[clust_name]])
clust_freq
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
734 601 845 2298 1329 817 2746 872 699 1212 920 1298 901 623 904 1016
17 18 19 20
1795 1186 168 35
cur_method <- "Pheno1"
name_cur_assay <- "exprs"
cur_assay <- "exprs"
clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(name_cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_isl_3,
expr_values = name_cur_assay,
cluster_by = clust_name,
channels = channels_islet)
# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "ISLET", "Heatmap",
sep = "_"), ".html")
heatmaply::heatmaply(
heatmaply::normalize(hm), main = clust_name,
file = file.path(paths$folder_script, fn))
Beta: 6, 12, 7, 9, 14, 13, 10, 11 Delta: 2, 1 Gamma: 20 alpha-cells: 19, 17, 15, 18, 16, 3, 8 Other: 5, alpha/beta mixed: 4
13, 10, 11
clust_methods <- c("Pheno1")
clust_assay <- c("exprs")
clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
clust_other <- c(5, 4, 20)
clust_delta <- c(2, 1)
clust_beta <- c(6, 12, 7, 9, 14, 13, 10)
clust_gamma <- c()
clust_alpha <- c(19, 17, 15, 18, 16, 3, 8, 11)
all_clust <- sort(c(clust_other, clust_delta, clust_beta, clust_gamma,
clust_alpha))
if ((!length(unique(all_clust)) ==
length(unique(colData(sce_isl_3)[, clust_name]))) ||
any(duplicated(all_clust))) {
stop("Recheck cluster attribution")
}
# Add cell types to the SCE object
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_beta,
"cell_type"] <- "Beta"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_alpha,
"cell_type"] <- "Alpha"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_delta,
"cell_type"] <- "Delta"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_gamma,
"cell_type"] <- "Gamma"
colData(sce_isl_3)[colData(sce_isl_3)[, clust_name] %in% clust_other,
"cell_type"] <- "Other"
# Quick Check Cell Types.
colData(sce_isl_3) |>
as_tibble() |>
dplyr::count(case_id, cell_type, donor_type) |>
arrange(cell_type)
# A tibble: 12 × 4
case_id cell_type donor_type n
<chr> <chr> <fct> <int>
1 6238 Alpha NoDiabetes 1428
2 6271 Alpha NoDiabetes 420
3 6396 Alpha RecentOnset 5858
4 6238 Beta NoDiabetes 4527
5 6271 Beta NoDiabetes 2602
6 6396 Beta RecentOnset 1167
7 6238 Delta NoDiabetes 350
8 6271 Delta NoDiabetes 66
9 6396 Delta RecentOnset 919
10 6238 Other NoDiabetes 1434
11 6271 Other NoDiabetes 450
12 6396 Other RecentOnset 1778
imgloader <- function(x, image_dir, image_names,
suffix_rem = "", suffix_add = "",
bit_depth = 16, type, ...) {
require(cytomapper)
image_list <- file.path(image_dir, image_names)
# Test if the image list exist
test_exist <- which(!file.exists(image_list))
if (length(test_exist) > 0) {
stop(c("The following images were not found:\n",
paste(image_list[test_exist], collapse = "\n")))
} else {
# Load and scale the images
images <- loadImages(image_list, ...)
# images <- scaleImages(images, (2 ^ bit.depth) - 1)
# Add image names to metadata
mcols(images)$ImageName <- gsub(suffix_rem, "", names(images))
mcols(images)$ImageName <- paste0(mcols(images)$ImageName, suffix_add)
# Add channel names
if (type == "stacks") {
print("Loading image stacks")
channelNames(images) <- rownames(x)
}
return(images)
}
}
colData(sce_isl)$Pheno1_exprs <- sce_isl_3[, colnames(sce_isl)]$Pheno1_exprs
viz_clust <- c(9, 10, 5, 1, 3, 2, 6) # Select cluster(s) to visualize.
viz_method <- "Pheno1"
viz_assay <- "exprs"
if (!is.null(viz_clust)) {
nb_images <- 14
image_extension <- ".tiff"
clust_name <- paste(viz_method, viz_assay, sep = "_")
# Subset the SCE
sce_viz <- sce_isl[, colData(sce_isl)[[clust_name]] %in% viz_clust]
# Select random image.
set.seed(seed)
image_sub <- sort(sample(
unique(sce_viz$image_fullname),
min(length(unique(sce_viz$image_fullname)), nb_images)))
# Folders
folder_images <- file.path(paths$folder_in, "img", paths$panel_type)
folder_masks <- file.path(paths$folder_in, "masks_cells",
paths$panel_type, "whole-cell")
# Load images and masks
images <- imgloader(
x = sce_viz,
image_dir = folder_images,
image_names = image_sub,
type = "stacks"
)
masks <- imgloader(
x = sce_viz,
image_dir = folder_masks,
image_names = image_sub,
as.is = TRUE,
type = "masks"
)
sce_viz <- sce_viz[, sce_viz$image_fullname %in% image_sub]
sce_viz$ImageName <- gsub(image_extension, "", sce_viz$image_fullname)
}
Loading required package: cytomapper
Loading required package: EBImage
Attaching package: 'EBImage'
The following object is masked from 'package:purrr':
transpose
The following object is masked from 'package:data.table':
transpose
The following object is masked from 'package:SummarizedExperiment':
resize
The following object is masked from 'package:Biobase':
channel
The following objects are masked from 'package:GenomicRanges':
resize, tile
The following objects are masked from 'package:IRanges':
resize, tile
Attaching package: 'cytomapper'
The following objects are masked from 'package:Biobase':
channelNames, channelNames<-
[1] "Loading image stacks"
FIXME: Make channel decomposition on pixel level aswell, such that we can visualize the decomposed channels.
# Use cytoviewer with images, masks and object
library(cytoviewer)
if (!is.null(viz_clust)) {
channels_view <- rownames(sce_isl)
sub_images <- cytomapper::getChannels(images, channels_view)
app <- cytoviewer(image = sub_images,
mask = masks,
object = sce_viz[channels_view, ],
img_id = "ImageName",
cell_id = "cell_number")
if (interactive()) {
shiny::runApp(app)
}
}
channels_immune <- c("CD3e", "CD4", "CD8a", "CD45RA", "FOXP3", "CD45RO", # T-cells:
"CD20", # B-cells. Likely almost absent.
"CD44",
"CD68", # Macrophages.
"MPO") # Neutrophils.
sce_immune_3 <- sce_immune_2[channels_immune, ]
##
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))
Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_immune_3))) {
set.seed(seed)
cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_immune_3, cur_assay)), k = k)
cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
colnames(cur_pheno) <- clust_name
rownames(cur_pheno) <- colnames(assay(sce_immune_3, cur_assay))
# Add Phenograph clusters to the colData of the SCE object
# Cluster `0` is attributed to non-subsetted cells and not islet cells
colData(sce_immune_3)[, clust_name] <- cur_pheno
remove(cur_pheno)
}
Run Rphenograph starts:
-Input data of 12653 rows and 10 columns
-k is set to 30
Finding nearest neighbors...DONE ~ 2.902 s
Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 0.554 s
Start jaccard
DONE ~ 0.014 s
Build undirected graph from the weighted links...DONE ~ 0.123 s
Run louvain clustering on the graph ...DONE ~ 0.64 s
Run Rphenograph DONE, totally takes 3.67900000000009s.
Return a community class
-Modularity value: 0.8723812
-Number of clusters: 23
Quickly check cluster abundances.
clust_freq <- table(colData(sce_immune_3)[[clust_name]])
clust_freq
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
426 1424 823 697 560 186 704 1030 989 389 341 442 424 1189 323 566
17 18 19 20 21 22 23
640 342 444 145 303 158 108
cur_method <- "Pheno1"
name_cur_assay <- "exprs"
cur_assay <- "exprs"
clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(name_cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_immune_3,
expr_values = name_cur_assay,
cluster_by = clust_name,
channels = channels_immune)
# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "ISLET", "Heatmap",
sep = "_"), ".html")
heatmaply::heatmaply(
heatmaply::normalize(hm), main = clust_name,
file = file.path(paths$folder_script, fn))
T-CD8: 13 T-CD4: 22, NK: 6 Other: 5, 10, 7, 19, 11, 3, 14, 9, 2, 18, 12, 21 Neutrophil: 20, 15 Mos: 23, 17, 16, 1
clust_methods <- c("Pheno1")
clust_assay <- c("exprs")
clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
clust_tcd4 <- c(22)
clust_tcd8 <- c(13) # 14 confirmed.
clust_tdn <- c()
clust_other <- c(6, 5, 10, 7, 19, 11, 3, 14, 9, 2, 18, 12, 21)
clust_B <- c()
clust_neutrophil <- c(20, 15) # confirmed.
clust_macrophage <- c(23, 17, 16, 1, 4, 8) # 15, 7 confirmed.
all_clust <- sort(c(clust_tcd4, clust_tcd8, clust_tdn, clust_B, clust_neutrophil,
clust_macrophage, clust_other))
if ((!length(unique(all_clust)) ==
length(unique(colData(sce_immune_3)[, clust_name]))) ||
any(duplicated(all_clust))) {
stop("Recheck cluster attribution")
}
# Add cell types to the SCE object
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tcd4,
"cell_type"] <- "T_CD4"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tcd8,
"cell_type"] <- "T_CD8"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_B,
"cell_type"] <- "B"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_neutrophil,
"cell_type"] <- "Neutrophil"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_macrophage,
"cell_type"] <- "Myeloid"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_other,
"cell_type"] <- "Other"
colData(sce_immune_3)[colData(sce_immune_3)[, clust_name] %in% clust_tdn,
"cell_type"] <- "T_DN"
# Quick Check Cell Types.
colData(sce_immune_3) |>
as_tibble() |>
dplyr::count(case_id, cell_type, donor_type) |>
arrange(cell_type)
# A tibble: 15 × 4
case_id cell_type donor_type n
<chr> <chr> <fct> <int>
1 6238 Myeloid NoDiabetes 572
2 6271 Myeloid NoDiabetes 185
3 6396 Myeloid RecentOnset 2710
4 6238 Neutrophil NoDiabetes 224
5 6271 Neutrophil NoDiabetes 19
6 6396 Neutrophil RecentOnset 225
7 6238 Other NoDiabetes 1703
8 6271 Other NoDiabetes 366
9 6396 Other RecentOnset 6067
10 6238 T_CD4 NoDiabetes 10
11 6271 T_CD4 NoDiabetes 3
12 6396 T_CD4 RecentOnset 145
13 6238 T_CD8 NoDiabetes 63
14 6271 T_CD8 NoDiabetes 46
15 6396 T_CD8 RecentOnset 315
# B-cells likely wrong annotaiton.
colData(sce_immune)$Pheno1_exprs <- sce_immune_3[, colnames(sce_immune)]$Pheno1_exprs
viz_clust <- c(1:19) # Select cluster(s) to visualize.
viz_method <- "Pheno1"
viz_assay <- "exprs"
if (!is.null(viz_clust)) {
nb_images <- 14
image_extension <- ".tiff"
clust_name <- paste(viz_method, viz_assay, sep = "_")
# Subset the SCE
sce_viz <- sce_immune[, colData(sce_immune)[[clust_name]] %in% viz_clust]
# Select random image.
set.seed(seed)
image_sub <- sort(sample(
unique(sce_viz$image_fullname),
min(length(unique(sce_viz$image_fullname)), nb_images)))
# Folders
folder_images <- file.path(paths$folder_in, "img", paths$panel_type)
folder_masks <- file.path(paths$folder_in, "masks_cells",
paths$panel_type, "whole-cell")
# Load images and masks
images <- imgloader(
x = sce_viz,
image_dir = folder_images,
image_names = image_sub,
type = "stacks"
)
masks <- imgloader(
x = sce_viz,
image_dir = folder_masks,
image_names = image_sub,
as.is = TRUE,
type = "masks"
)
sce_viz <- sce_viz[, sce_viz$image_fullname %in% image_sub]
sce_viz$ImageName <- gsub(image_extension, "", sce_viz$image_fullname)
}
[1] "Loading image stacks"
# Use cytoviewer with images, masks and object
library(cytoviewer)
if (!is.null(viz_clust)) {
channels_view <- rownames(sce_immune)
sub_images <- cytomapper::getChannels(images, channels_view)
app <- cytoviewer(image = sub_images,
mask = masks,
object = sce_viz[channels_view, ],
img_id = "ImageName",
cell_id = "cell_number")
if (interactive()) {
shiny::runApp(app)
}
}
channels_exo <- c("Ecdh",
"CD44", # Pan-Exo
"CK19",
"PDX1", # Ductal cells
"SMA", # Smooth Muscle; Activated Fibroblasts.
"AMY", # Acinar
"CD31", # Endothelial)
"CA9")
sce_exo_3 <- sce_rest_2[channels_exo, ]
##
clust_name <- paste(clust_method, cur_assay, sep = "_")
writeLines(c("\n", clust_name))
Pheno1_exprs
# Run Phenograph
if (!clust_name %in% colnames(colData(sce_exo_3))) {
set.seed(seed)
cur_pheno_annoy <- Rphenoannoy::Rphenoannoy(t(assay(sce_exo_3, cur_assay)), k = k)
cur_pheno <- DataFrame(cur_pheno_annoy[[2]]$membership)
colnames(cur_pheno) <- clust_name
rownames(cur_pheno) <- colnames(assay(sce_exo_3, cur_assay))
# Add Phenograph clusters to the colData of the SCE object
# Cluster `0` is attributed to non-subsetted cells and not islet cells
colData(sce_exo_3)[, clust_name] <- cur_pheno
remove(cur_pheno)
}
Run Rphenograph starts:
-Input data of 105977 rows and 8 columns
-k is set to 30
Finding nearest neighbors...DONE ~ 27.24 s
Compute jaccard coefficient between nearest-neighbor sets...
Presorting knn...
presorting DONE ~ 4.8 s
Start jaccard
DONE ~ 0.088 s
Build undirected graph from the weighted links...DONE ~ 1.359 s
Run louvain clustering on the graph ...DONE ~ 16.759 s
Run Rphenograph DONE, totally takes 45.445999999999s.
Return a community class
-Modularity value: 0.8709948
-Number of clusters: 31
Quickly check cluster abundances.
clust_freq <- table(colData(sce_exo_3)[[clust_name]])
clust_freq
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3547 6064 4225 3413 1932 4543 4243 3365 2925 7121 6531 3551 3476 1942 3807 3224
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
4745 822 378 4501 4477 4721 2192 2630 2199 4853 3170 3008 923 2554 895
cur_assay <- "exprs"
clust_name <- paste(cur_method, cur_assay, sep = "_")
message(clust_name)
Pheno1_exprs
message(cur_assay)
exprs
# Summarize the data
hm <- summarize_heatmap(sce_exo_3, expr_values = cur_assay,
cluster_by = clust_name,
channels = channels_exo)
hm
Ecdh CD44 CK19 PDX1 SMA AMY
1 0.14983135 0.004307549 0.01328572 0.02533717 0.14234041 0.09162547
2 0.56939991 0.004307549 0.01476372 0.05925981 0.04105287 0.28725911
3 0.38839434 0.004307549 0.02490855 0.04982429 0.35212821 0.29453913
4 0.84819587 0.004307549 0.01653588 0.06872280 0.04058910 0.25147103
5 0.13005991 0.004307549 0.01735473 0.02324715 0.84666689 0.06942178
6 0.56782040 0.004307549 0.01687559 0.06050042 0.04176513 0.51704293
7 0.51647498 0.004307549 0.17747254 0.08697404 0.05301493 0.27610286
8 1.04273284 0.004307549 0.27874838 0.30375619 0.05810918 0.30637757
9 0.43874011 0.004307549 0.97234507 0.31414511 0.07639490 0.15743609
10 0.39658611 0.004307549 0.43288148 0.29333855 0.07877719 0.17672357
11 0.79485353 0.004307549 0.67719794 0.36241759 0.07182983 0.22950880
12 0.22304566 0.004307549 0.01660130 0.03584923 0.05630251 0.11341332
13 1.34244566 0.004307549 0.43861439 0.52446691 0.06858005 0.67975774
14 1.26597971 0.004307549 0.02556046 0.08819125 0.04157502 0.58500955
15 0.18296583 0.004307549 0.01474290 0.04211528 0.04210795 0.28617931
16 0.74971857 0.004307549 0.65987668 0.36204914 0.06291548 0.64949101
17 0.61371305 0.004307549 1.56690807 0.30642285 0.06908359 0.16358277
18 0.27775545 0.004307549 0.01213637 0.05373568 0.08947229 0.77244758
19 0.08853996 0.004307549 0.02137439 0.01808019 2.49523755 0.05411067
20 0.33681775 0.004307549 0.01656282 0.05257284 0.03956788 0.52691301
21 0.31682454 0.004307549 0.01418684 0.07040967 0.04209082 0.95343344
22 0.36677566 0.004307549 0.01535156 0.05435656 0.04041413 0.30964799
23 0.43284893 0.004307549 0.02214219 0.04407031 0.07239359 0.29185655
24 0.84049130 0.004307549 0.01509659 0.06838766 0.03518946 0.47387274
25 0.68400119 0.004307549 0.01730337 0.07789659 0.04102914 0.89825874
26 1.10559303 0.004307549 0.31976628 0.35378377 0.05720911 1.36074764
27 0.60419270 0.004307549 0.01411833 0.08822477 0.04136256 1.24577265
28 0.36906133 0.004307549 0.01393544 0.08254205 0.04317645 1.47406134
29 0.97007965 0.004307549 0.01613171 0.09898964 0.04055951 1.57125991
30 0.54368739 0.004307549 0.01730896 0.08841986 0.04546984 1.92461585
31 2.10722403 0.004307549 1.78213986 0.60652077 0.07925724 0.72872404
CD31 CA9
1 0.15397941 0.01774506
2 0.02431137 0.02432417
3 0.02763228 0.04316357
4 0.02594964 0.02733137
5 0.04857285 0.02196356
6 0.02597827 0.02803190
7 0.02045637 0.03582682
8 0.01773869 0.03540149
9 0.01523137 0.04458512
10 0.01505070 0.03727501
11 0.01612048 0.04868868
12 0.01738729 0.02058095
13 0.02320205 0.04393136
14 0.03458049 0.04353523
15 0.01868236 0.02136079
16 0.01915734 0.04133103
17 0.01604831 0.06105087
18 0.22987077 0.02166004
19 0.08581290 0.01848027
20 0.02320499 0.02525009
21 0.03214560 0.02756279
22 0.02208901 0.02236137
23 0.02028548 0.23827565
24 0.02894051 0.02629785
25 0.03202350 0.03235486
26 0.02598490 0.04115513
27 0.04062754 0.03276893
28 0.04185566 0.03171415
29 0.04615143 0.03509572
30 0.04429425 0.03516834
31 0.03797643 0.15089170
# Display the heatmap
fn <- paste0(paste(today, "Clusters", clust_name, "EXOCRINE", "Heatmap",
sep = "_"), ".html")
heatmaply::heatmaply(
hm, main = clust_name,
file = file.path(paths$folder_script, fn))
sce_rest
class: SpatialExperiment
dim: 27 105977
metadata(6): spillover_matrix subset ... spillover_matrix subset
assays(6): counts compcounts ... scaled fastMNN_case_id
rownames(27): H3 CD44_GCG ... DNA3 PPY
rowData names(10): channel metal ... shortname channel_name
colnames(105977): 6238_Compressed_001_5 6238_Compressed_001_19 ...
6396_Compressed_030_1443 6396_Compressed_030_1445
colData names(32): case_id panel ... Pheno1_scaled_cat cell_category
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : cell_x cell_y
imgData names(1): sample_id
Other: 8, 4, 7, 23, 15, 12, 3 Acinar: 26, 29, 30, 28, 27, 13, 14, 24, 25, 21, 20, 22, 6, 2, Ductal: 31, 16, 11, 10, 8, 17 Fibro_SM: 5, 19 Endothelail: 18, 1
clust_methods <- c("Pheno1")
clust_assay <- c("exprs")
clust_name <- paste(clust_methods, clust_assay, sep = "_")
print(clust_name)
[1] "Pheno1_exprs"
# Define which clusters correspond to which cell types
#clust_other <- c(9, 20, 16, 11)
#clust_acinar <- c(10, 3, 5, 13, 15, 19, 6, 12, 17)
#clust_ductal <- c(2, 8, 18, 4, 7)
#clust_fibro_sm <- c(1)
#clust_endo <- c(14)
clust_other <- c(4, 7, 23, 15, 12, 3)
clust_acinar <- c(26, 29, 30, 28, 27, 13, 14, 24, 25, 21, 20, 22, 6, 2)
clust_ductal <- c(31, 16, 8:11, 17)
clust_fibro_sm <- c(5, 19)
clust_endo <- c(18, 1)
all_clust <- sort(c(clust_acinar, clust_ductal, clust_fibro_sm, clust_endo, clust_other))
if ((!length(unique(all_clust)) ==
length(unique(colData(sce_exo_3)[, clust_name]))) ||
any(duplicated(all_clust))) {
stop("Recheck cluster attribution")
}
# Add cell types to the SCE object
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_other,
"cell_type"] <- "Other"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_acinar,
"cell_type"] <- "Acinar"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_ductal,
"cell_type"] <- "Ductal"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_fibro_sm,
"cell_type"] <- "Fibro_SM"
colData(sce_exo_3)[colData(sce_exo_3)[, clust_name] %in% clust_endo,
"cell_type"] <- "Endothelial"
sce$cell_type <- "0"
sce[, colnames(sce_isl_3)]$cell_type <- sce_isl_3$cell_type
sce[, colnames(sce_immune_3)]$cell_type <- sce_immune_3$cell_type
sce[, colnames(sce_exo_3)]$cell_type <- sce_exo_3$cell_type
table(sce$cell_type)
Acinar Alpha Beta Delta Ductal Endothelial
49061 7706 8296 1335 28806 4369
Fibro_SM Myeloid Neutrophil Other T_CD4 T_CD8
2310 3467 468 33229 158 424
fn_spe <- file.path(paths$folder_out, paste0(paths$object_type, "_", paths$panel_type, ".rds"))
message(assayNames(sce))
exprs
saveRDS(sce, fn_spe)
# readRDS(fn_spe)
suppressPackageStartupMessages(c(
library(heatmaply),
library(htmltools)#,
#library(cytomapper)
))
assay_sel <- c("exprs")
names(assay_sel) <- c("exprs")
sce_sub$cell_type <- sce[, metadata(spe)$subset]$cell_type
sce_sub <- sce_sub[, sce_sub$cell_type != "Other"]
# Prepare the data
cur_dat <- makePerCellDF(sce_sub, use_dimred = TRUE) |>
arrange(case_id, donor_type)
# Plot
cur_dimred <- "UMAP"
dimred_name <- paste(cur_dimred, cur_assay, sep = "_")
clust_name <- "cell_type"
p <- plot_dim_red(cur_dat, dimred_name, clust_name,
sample = TRUE, size = 1, alpha = 1)
print(p)
fn <- paste0(paste(today, clust_name, cur_dimred,
sep = "_"), ".png")
do.call(ggsave, c(list(fn, p), plotsave_param))
clust_name <- "cell_type"
channels_heatmap <- rownames(sce)[!rownames(sce) %in% c("CD44", "PPY", "CA9", "DNA1", "DNA3", "H3", "PTPRN", "GLUT1", "FOXP3", "CD45RA")]
# Summarize the data
hm <- summarize_heatmap(sce[, metadata(sce)$subset],
expr_values = name_cur_assay,
cluster_by = clust_name,
channels = channels_heatmap)
# Display the heatmap
fn <- paste0(paste(today, clust_name, "Heatmap",
sep = "_"), ".html")
set.seed(1996)
heatmaply(
heatmaply::normalize(hm), main = clust_name,
file = file.path(paths$folder_script, fn))
Problem: - SST. CD45RA. Good: CD8a_INS. -
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] htmltools_0.5.8.1 heatmaply_1.5.0
[3] viridis_0.6.5 viridisLite_0.4.2
[5] plotly_4.10.4 cytoviewer_1.2.0
[7] cytomapper_1.14.0 EBImage_4.44.0
[9] BiocParallel_1.36.0 patchwork_1.2.0
[11] ranger_0.16.0 clustree_0.5.1
[13] ggraph_2.2.1 Rphenoannoy_0.1.0
[15] Matrix_1.6-5 igraph_2.1.4
[17] scran_1.30.2 scater_1.30.1
[19] scuttle_1.12.0 furrr_0.3.1
[21] future_1.49.0 purrr_1.0.2
[23] tictoc_1.2.1 data.table_1.17.2
[25] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0
[27] SummarizedExperiment_1.32.0 Biobase_2.62.0
[29] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[31] IRanges_2.36.0 S4Vectors_0.40.2
[33] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
[35] matrixStats_1.5.0 dplyr_1.1.4
[37] ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 later_1.3.2
[3] bitops_1.0-9 svgPanZoom_0.3.4
[5] tibble_3.2.1 polyclip_1.10-7
[7] lifecycle_1.0.4 edgeR_4.0.16
[9] rprojroot_2.0.4 globals_0.18.0
[11] lattice_0.22-7 MASS_7.3-60
[13] crosstalk_1.2.1 dendextend_1.17.1
[15] magrittr_2.0.3 limma_3.58.1
[17] sass_0.4.9 rmarkdown_2.27
[19] jquerylib_0.1.4 yaml_2.3.10
[21] metapod_1.10.1 httpuv_1.6.15
[23] sp_2.1-4 RColorBrewer_1.1-3
[25] abind_1.4-8 zlibbioc_1.48.2
[27] RCurl_1.98-1.14 tweenr_2.0.3
[29] git2r_0.33.0 seriation_1.5.5
[31] GenomeInfoDbData_1.2.11 ggrepel_0.9.5
[33] irlba_2.3.5.1 listenv_0.9.1
[35] terra_1.7-78 dqrng_0.4.1
[37] parallelly_1.44.0 svglite_2.1.3
[39] DelayedMatrixStats_1.24.0 codetools_0.2-20
[41] DelayedArray_0.28.0 ggforce_0.4.2
[43] tidyselect_1.2.1 raster_3.6-26
[45] farver_2.1.2 ScaledMatrix_1.10.0
[47] TSP_1.2-4 webshot_0.5.5
[49] jsonlite_2.0.0 BiocNeighbors_1.20.2
[51] tidygraph_1.3.1 iterators_1.0.14
[53] systemfonts_1.1.0 foreach_1.5.2
[55] tools_4.3.1 ragg_1.3.2
[57] Rcpp_1.0.14 glue_1.8.0
[59] gridExtra_2.3 SparseArray_1.2.4
[61] xfun_0.52 HDF5Array_1.30.1
[63] ca_0.71.1 shinydashboard_0.7.2
[65] withr_3.0.2 fastmap_1.2.0
[67] rhdf5filters_1.14.1 bluster_1.12.0
[69] fansi_1.0.6 digest_0.6.37
[71] rsvd_1.0.5 mime_0.13
[73] R6_2.6.1 textshaping_0.4.0
[75] colorspace_2.1-1 jpeg_0.1-11
[77] utf8_1.2.5 tidyr_1.3.1
[79] generics_0.1.4 graphlayouts_1.1.1
[81] httr_1.4.7 htmlwidgets_1.6.4
[83] S4Arrays_1.2.1 whisker_0.4.1
[85] uwot_0.2.2 pkgconfig_2.0.3
[87] gtable_0.3.6 registry_0.5-1
[89] workflowr_1.7.1 XVector_0.42.0
[91] fftwtools_0.9-11 scales_1.3.0
[93] png_0.1-8 knitr_1.47
[95] reshape2_1.4.4 rjson_0.2.23
[97] rhdf5_2.46.1 cachem_1.1.0
[99] stringr_1.5.1 shinycssloaders_1.0.0
[101] miniUI_0.1.1.1 vipor_0.4.7
[103] pillar_1.9.0 grid_4.3.1
[105] vctrs_0.6.5 RANN_2.6.2
[107] promises_1.3.0 BiocSingular_1.18.0
[109] beachmat_2.18.1 xtable_1.8-4
[111] cluster_2.1.8.1 archive_1.1.12
[113] beeswarm_0.4.0 evaluate_1.0.3
[115] magick_2.8.3 cli_3.6.5
[117] locfit_1.5-9.9 compiler_4.3.1
[119] rlang_1.1.6 crayon_1.5.3
[121] labeling_0.4.3 mclust_6.1.1
[123] plyr_1.8.9 fs_1.6.6
[125] ggbeeswarm_0.7.2 stringi_1.8.7
[127] nnls_1.6 assertthat_0.2.1
[129] munsell_0.5.1 lazyeval_0.2.2
[131] tiff_0.1-12 colourpicker_1.3.0
[133] sparseMatrixStats_1.14.0 Rhdf5lib_1.24.2
[135] shiny_1.8.1.1 statmod_1.5.0
[137] highr_0.11 fontawesome_0.5.3
[139] memoise_2.0.1 bslib_0.7.0